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Abstract. We study optimal behavior of energy producers under a CO2 emission abatement pro- 
gram. We focus on a two-player discrete-time model where each producer is sequentially optimizing 
her emission and production schedules. The game-theoretic aspect is captured through a reduced- 
form price-impact model for the CO2 allowance price. Such duopolistic competition results in a 
new type of a non-zero-sum stochastic switching game on finite horizon. Existence of game Nash 



equilibria is established through generalization to randomized switching strategies. No uniqueness 
is possible and we therefore consider a variety of correlated equilibrium mechanisms. We prove 
existence of correlated equilibrium points in switching games and give a recursive description of 
£SJ equilibrium game values. A simulation-based algorithm to solve for the game values is constructed 

and a numerical example is presented. 

u 
o 

1. Introduction 

& 

In this paper we study a new class of non-zero-sum stochastic switching games with continuous 
state-space. Such games have natural applications in economics and finance, in particular for 
describing oligopolistic competition between large commodity producers. Our analysis is motivated 
by the CO2 cap-and-trade markets and provides new quantitative insight into the game-theoretic 
^ aspects of these schemes. 

y—i Switching game are a special class of dynamic non-zero-sum state-space games. They are char- 

acterized by a finite number of system states u, jointly selected by the players. The players dy- 
namically react to actions of other players and the evolution of state variables, represented as 
controlled stochastic processes, by strategically modifying the system state. Our contribution is a 
first rigorous probabilistic analysis of switching games. Because multiple game Nash equilibria are 
possible in our model, we propose to apply the wider concept of correlated equilibria. Correlated 
equilibria give a clear financial mechanism for stepwise equilibrium selection. Our key result is 



the construction of correlated equilibria in switching games in Section 3.3 The resulting repre- 



sentation in Theorem 3.4 of switching games in terms of a recursive sequence of stopping games 
leads to a constructive characterization of equilibrium strategies. Namely, we prove the analogue of 
the dynamic programming equation for the game values which enables numerical solution through 
backward recursion. Thus, the complexity of switching games is only slightly higher than of regular 
optimal switching problems. 
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In terms of existing literature this paper extends two separate strands of research. Work on 
stochastic zero-sum stopping games dates back to Dynkin |14j . Such Dynkin games were progres- 
sively generalized in [H HU El 113 HO] ■ Later extensions also treated special cases of non-zero-sum 
stopping games, especially the so-called monotone type [22 E21 E3] • The key tool of correlated equi- 
libria in stochastic dynamic games was studied by [UJ E51 EH [371 ESI EH] . We augment these results 
by explicitly characterizing correlated equilibria in repeated stopping games using the methods of 
Ramsey and Szajowski [35]. Contemporaneously, the theory of optimal switching for a single agent 
was developed and extensively studied in the past decade, see [8j [T21 [201 Ell- m Section 3.3 we 
extend these results to a game setting by showing that at switching game equilibrium each player 
faces a optimal switching problem with randomized controls. 

Another contribution of this work is a construction of numerical schemes to compute game- values 
and equilibrium strategies of switching games. This is achieved by combining backward recursion 
together with sequential solution of local 2x2 games. We suggest two approaches, one based 
on the Markov chain approximation method and a second algorithm that relies on least squares 
Monte Carlo. The latter is a novel extension of our previous work in [HI [28] and borrows ideas 
from standard optimal stopping theory to implement the analogue of the dynamic programming 
recursion on a set of Monte Carlo simulations. 

A significant portion of the paper is dedicated to the application of our model to emissions 
trading. With imminent ramping up of C02-emissions markets around the world (see e.g. the 
Western Climate Initiative in the US and the EU ETS Phase III in Europe both set to start in 
2012), it is crucial to understand energy producer behavior under the new frameworks. By design, 
the carbon allowances will be scarce and market participants will be competing for finite permit 
resources. Our analysis is a first pass at oligopolistic competition in CO2 markets using game- 
theoretic methods. We hope it can serve as a stepping-stone to more sophisticated modeling that 
addresses market design and comparative statics of our framework. 

The rest of the paper is organized as follows. In Section [2] we define the precise stochastic model 
representing oligopolistic competition in CO2 markets. Section [3] constructs the representation of 



switching games in terms of repeated stopping games and culminates with Theorem 3.4 that estab- 
lishes the dynamic programming equations. Section [4] describes our numerical solution algorithms 
and presents a computational example. Finally, Section [5] discusses extensions of the model and 
points directions for future work. 

2. Competitive Equilibrium among Oligopolistic Producers 

In this section we formally define a model for the competitive dynamic equilibrium between 
producers with market power. 



2.1. Price Dynamics under Cap-and- Trade Emission Schemes. The carbon allowance mar- 
kets are intrinsically linked to other energy markets, notably electricity whose production accounts 
for the bulk of regulated emissions. Furthermore, due to their size, major electricity producers often 
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have the ability to dramatically move carbon prices based on their emission schedules. Hence, to 
understand equilibria in C02-markets it is useful to consider them from the point of view of large 
traders. The model proposed below captures this phenomenon for a joint electricity-carbon market. 

To fix ideas, start with a filtered probability space (p,,7i, (.Ft), IP), t G T = {0, 1, . . . , T}. The 
terminal time T is the expiration date of the current vintage of permits. We consider two het- 
erogenous producers (henceforth termed players) who each produce commodity P (electricity) and 
consume commodity X (carbon allowances). These two producers generate "dirty" electricity from 
e.g. coal or gas and can be viewed as representative agents of a park of power plants with identical 
engineering characteristics. All the other participants in the electricity and CO2 markets are not 
modeled explicitly; rather we postulate that their collective actions induce stochastic fluctuations 
in the prices of P and X. We assume that the two players are large traders in the carbon market, 
but small players on the electricity market. This reflects the fact that the other "green" producers 
(who use nuclear, hydroelectric, renewable, etc. sources) create a competitive electricity market 
while remaining passive in the C02-permits arena. 

The electricity price is given by the exogenous discrete-time stochastic process (Pt), for simplicity 
taken to be one-dimensional, 

Pt+i = G(P t , ef ), 

where the innovations {ef ) are i.i.d. standard Gaussian. Our canonical example is the logarithmic 
Ornstein-Uhlenbeck process which is a log-normal stationary Gaussian process with 

(1) Pt+l = Pt ■ exp (*p(P- log P t ) +a P ef), ef~jV(0,l), 

for some positive constants Kp,P,o~p. 

The objective of the producers is to maximize their expected net profits over the planning horizon 
T. The producers' profit is given by their clean dark spread |17] which is defined as the difference 
between electricity price and the carbon-adjusted production cost. We assume that input fuel costs 
are fixed, as is often the case for power generators with long-term supply contracts. The strategy of 
each player is described by a repeated start- up/shut-down option. Namely, if the market conditions 
are unfavorable, a player can stop production, eliminate CO2 emissions and avoid losses; she can 
then restart production when the profit spread improves. As a first approximation we assume that 
these choices of production regimes are binary and denoted as "off" (0) and "on" (1). Formally, the 
production schedule of each producer is described by a stochastic process Ui, ui(t) G {0, 1}, i £ T. 
In a single-player model, such timing optionality is known as a real option and has been thoroughly 
investigated since the seminal work of jH [13] . Repeated real options have attracted considerable 
attention recently, see [H [201 [28], an d others. 

Remark 2.1. An alternative formulation is to take U{ to be continuous, so that the producers 
can choose emissions levels smoothly. This would lead to a non-zero-sum stochastic dynamic 
game. Such models have been extensively studied both in discrete and continuous time, see [22] 
[30] and references therein. While presenting other formidable technical challenges, the problem 
of equilibrium selection is less severe with continuous controls thanks to the convexity of value 



-1 



MICHAEL LUDKOVSKI 



(2) 



functions. In this work we focus on the timing flexibility and therefore maintain the discrete 
control space. 

Let Xt be the permit price at date t. Based on above discussion, the actions of each player 
influence the dynamics of Xt- Namely, conditional on player actions u\(t),u 2 (t), we model Xt as 
another mean-reverting process with a policy-dependent mean and log-Gaussian increments, 

( Xt+i =Xfexp(Kx(f(ui(t),U2(t))-logX t ) +a x ef) with 

1 f(ui,u 2 ) = log(X + gmi + g 2 u 2 )- 

The sequence (e^) is again Gaussian, with correlation parameter p to (ef), i.e. ef = pef + 
yl — p 2 ei~ with ~ A/"(0, 1) independent of ef . Rising electricity prices are likely to increase 
the overall C0 2 emission rates and therefore we expect that P and X are positively correlated, 
p>0. 

Remark 2.2. To motivate ([2]), we recall from [18] that in a carbon market u Xf = xF{Ct > c| J^}," 
where Ct is the cumulative total C0 2 emissions on [0, T], x is the penalty for going over the 
allowance limit, c is the total amount of allowances allocated and P is the equilibrium pricing 
measure. We postulate that Ct = Yl^=o {biu\(s) + b 2 u 2 {s) + u(s)}, where biUi(s) are the emissions 
of producer i in period s and u(s) are the emissions by all other market participants. Assuming 
independent increments (due to external shocks such as weather effects, etc.) in u(s), the dynamics 
([2]) follow, with some complicated and time-dependent functions f(t, •) and volatility o~x(t). In ((2]) 
we give a simplified or reduced-form version of this description to capture the temporal feedback 
between Ui's and Xt. If the supply curve for the C0 2 allowances is convex, then the price impact 
f(ui,u 2 ) in ([2]) would be nonlinear and further magnify the competitive effects. 

2.2. Optimization Objective. We assume that the producers have zero allowance allocations 
and cannot bank allowances; therefore they must purchase the requisite allowances at each stage 
of the game. The total P&L of the players then consists of (i) revenue from selling electricity, 
minus the (ii) cost of buying emission allowances, as well as (iii) operational costs due to adopted 
strategy Uj. An important case of operational costs are fixed switching costs Kn^^x that are 
paid each time the production regime of agent i is changed from j± to j 2 and corresponding to 
the ramping- up/ winding-down costs associated with the electricity turbines 0H7]. We postulate 
K {i,h,ii} = V -?i and the triangle inequality K {iJ ^ } < K {iJ ^ k} + K {iAi] for all j, k,£. 

Let F u be the law of (Pt, X t ) given a strategy pair u = (ui(t),u 2 (t))f =0 . The expected cumulative 
net profit of producer i starting with P s = p, X s = x and initial production regime ( G {0, l} 2 is 

(3) 



Vi(s,p,x,(;u)±E a 



"T-l 

^2 {{aiPt - hXt - Ci)ui{t) - -K{i, Uj ( t -) lUi (t)}} Ps =P,X S = x,u(s-) = ( . 
.t=s 

The constants ai, bi, a, Ku^ j 2 y, i = 1,2, represent the maximum quantity of electricity produced 
by the facility in one period, the amount of corresponding C0 2 allowances needed, fixed production 
costs and switching costs, respectively. Due to switching costs, current production regime is also a 
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state variable. Below, the theorems on existence of equilibria in stochastic games require bounded 
payoffs; therefore we assume that profits are truncated from above at some large positive level. 

In the duopoly setting, while each player aims to maximize her own profits, the competitor 
actions will also affect her decisions. Indeed, emissions today shrink remaining permit supplies and 
tend to increase future CO2 prices. Therefore, if player 1 is emitting, player's 2 expected future 
profits are reduced. Overall, the producers are facing a stochastic game where actions correspond 
to the latest choice of production regime by each player and payoffs are a function of the exogenous 
Pt and the partly controlled Xt- Since the game is stochastic and multi-period with Markov state 
variables, we restrict our attention to Markovian (feedback) equilibria. Our main task for the 
remainder of the paper is to characterize such game equilibria and then compute the corresponding 
game value functions (i.e. expected profits) Vi and equilibrium emission schedules (u*,^)- 

2.3. Randomized Emission Schedules. The strategies U{ may be mixed or randomized, i.e. Ui(t) 
is not necessarily adapted to the market filtration F. However, we also assume a full-information 
setting, whereby the emission schedules of each agent are publicly known after the fact. Accordingly, 
the market observables are Ft = ct(Xq, Pq,ui(0),U2{0), ■ ■ ■ , U\(t— 1), wit — l),Xt, Pt), the filtration 
generated by the price histories and past actions. An ^-randomized emission strategy is a pair 
(ui(t) , Q % (t)) where Q % is an independent enlargement of the filtration F (i.e. P(A|J^) = P(j4|^) 
for all A £ Ft) an d u% is C? 4 -adapted. Let 

(4) Pi (t)±F( Ui (t) = l\F t ) 

denote the stage-t probability that the control will be 'on', given observable information so far. If 
Pi(t) £ {0, 1} then the strategy is pure at stage t, otherwise it is mixed and can be represented via 
a randomization parameter iji(t) as 

(5) Ui(t) = l{ m (t)< Pi (t)}, Vi(t) ~ Unif(0, l),r)i(t) _L F t . 

The full mixed strategy is the vector 7Ti(t) = (I — Pi(t),Pi(t)) belonging to the 2-simplex 7fj(t) £ 
= {(ir°,7t 1 ) : 7r J > 0, 7T° + 7T 1 = 1}. The joint action is given by the strategy profile vr(t), with 
(t) denoting the probability that player i emits at level j. 

The set Ui of admissible production schedules for player i consists of ^-randomized {0, 1}-valued 
processes and can be canonically identified with an ^-adapted process (pi(t)), < Pi(t) < 1 and 
independent sequence r/j(t) as in ph. Let D l (t) denote the set of C/*-stopping times bigger than t. 
Because Ui(t) £ {0, 1}, we have a one-to-one correspondence between admissible Uj's and sequences 
{r%)f =l satisfying r« +1 £ ©<(r£), 

T 

(6) Ul {t) =J2 U ^ 1 ir^ k+1 ) + (1 " ^(0))^,^), T « = 0. 

k=0 

The switching times encode the times of production regime shifts defined by Uj. The represen- 
tation ^ holds because at most one regime switch can be made by each player at any given stage. 
Indeed, multiple simultaneous regime switches by the same producer are strongly sub-optimal if 
Kuj^jz} > and weakly suboptimal otherwise. A C/*-adapted stopping time r can also be viewed as 
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a randomized .F-stopping time, via its conditional stopping probabilities pt = P(r = i|r > i — 1, Ft), 
namely 

(7) r(p) = inf{t : m < Pt}, Vt ~ tfni/(0, 1) 1 Ji- 

When ^ G {0, 1} for all t, we are back in the case of regular ^-stopping times. 

If Q\ f] Gt = -Ft then the randomizations of the two players are independent. Alternatively, 
correlated decision making can be introduced by making the rj^s in ([5]) dependent. Let 7 be an T- 
adapted stochastic process taking values in A4. Following [35] we interpret 7(2) as a weak (stepwise) 
communication device, with Jij(t) specifying the probability that player 1 takes action i G {0,1} 
and player 2 applies action j G {0, 1}, 

7tf(*) = P(«i(t) =*,«2(*) =j\?t)- 

The correlation is implemented via a third party that directs the players to implement a particular 
action pair through private signals. Namely, the players receive signals 

( 8 ) 7) = 1{7iq(*)+7ii (*)<»?(*)} and M*>7) = 1 {70iW+7iiW<^W}' 

where 77 ~ Unif (0,1) is only observed by the third party. Setting Uj(t) = /Uj(f, 7), the resulting 
strategy profile is denoted u(i, 7) = (v?i(£, 7), ^(i, 7)) and has dependent marginals and joint 
distribution 7. Conditional on the signal at date t, an agent can impute the strategy of the 

other player by e.g. tt 2 (t, 7)^(^=0 = ( tdoW+^iW 7ooW+7oi(f) )- With T in P lace ' the s P ace of 
randomized strategies is now adjusted to ^(7) = {(ui, Q l ) G ZY, such that £7| 2 ^ V a(fii(t)). 



2.4. Correlated Equilibria in CO2 markets. The introduced correlation mechanism 7 can be 
used to define correlated equilibrium points (CEP) in the CO2 emissions duopoly game. To motivate 
the need for such mechanisms, we observe that intuitively the dynamic switching game is a sequence 
of one-period bimatrix games. At each stage t, the control Ui(t) G {0, 1} of each player i G {1,2} is 
simply "on/off", leading to the classic 2x2 game. From a dynamic point of view, the relevant payoff 
to the players at stage t is then the sum of the current clean spread and the continuation value that 
corresponds to the game value that can be realized in the future by the respective player contingent 
on current state of the world. In our repeated game setting, the players must o priori agree on 
how to implement future equilibria, otherwise the computation of continuation values would not 
be possible. Hence, to have a well-defined switching game value, we need existence-uniqueness of 
equilibria in the one-period sub-games. 

Accordingly, we briefly recall the structure of 2-by-2 one-shot game. Consider the 2x2 game H 
with normal form 



(9) H 



>i°,4°) (4\^i 
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where the rows of H are chosen by player 1, and the columns by player 2. A strategy profile (7?* , 7?^ ) 
is a Nash equilibrium point (NEP) of H if we have 

E*,j *,k jk \ j *,k jk j \ *,j *,k jk x ■> *,j k jk 

n l n 2 Z l = SU P l^^i Z l : and Z^^l ^2 Z 2 = SU P Z^^l n 2 Z 2 ■ 

j,k ffieAa j,fc j,fc 7?26A2 3 \k 

Hence, 7?* is a best-response for player i, given that the other player uses 7?lj. While the classical 
theorem of Nash shows that a mixed NEP is always available, H may have zero (strictly competitive 
case), one (standard case) or two (coordination case) pure NEP's [5]. 

Thus, to establish existence of NEP in a multi-period game, one must consider mixed strategies. 
Furthermore, since multiple equilibria are possible, an equilibrium selection rule is needed. In the 
context of the CO2 emission game, because the (P, X)-prices are stochastic, it is impossible to a 
priori rule out some of the above scenarios for all possible state variable realizations. In particular, 
the case of the anti-coordination "battle-of-the-sexes" or "chicken" game is likely to appear when 
the electricity-carbon spread is slightly positive. In such a situation, each of the players will have 
an incentive to emit; however, if the price impact is strong enough, it is not profitable for both of 
them to consume permits. As a result, two pure Nash equilibria are possible whereby one producer 
yields the market to the other. 

The communication device 7, introduced in one-shot games by [21 [29], provides a general method 
for describing such coordination while maintaining the non-cooperative game setting. 

Definition 2.1. A Markovian correlated equilibrium point for the switching game is a Markov 
communication device 7 : (s,p,x, £) —¥ A4 inducing admissible stage strategy profiles u*(i;7) = 
7), ^(t, 7)) such thaty(s,po,xo,C) (recall definition ofVi in 

V2{s,po,xo,C;u 1 ,u 2 ) > V2(s,p ,xo,C;ul,U2) V-u 2 G U 2 . 
The resulting game values are denoted as Vi(s,po, xq, C; 7). 



The meaning of the correlated equilibrium in (10) is that conditional on the private signal 



sequence, neither player has an incentive to deviate from the prescribed action. Therefore, given 



7) and market information J^, it is optimal to take Ui(t) = ^(£,7). Note that in (10), even if 
a player chooses to deviate from the recommendation fii(t, 7) she continues to receive future signals 
fj,i(s, 7), s > t and therefore information about the implied strategy of the other player. Existence 
of CEP of switching games will be established in Theorem |3.4[ We will also provide a recursive 
construction of Vi(t, •) in terms of conditional expectations of Vi(t + 1, •) and one-shot 2x2 games. 
This allows for a solution method, detailed in Section [4j analogous to the dynamic programming 
paradigm for ordinary stochastic control problems. Finally, we will show that CEP of switching 
game induces rational behavior at each stage, i.e. matches with a CEP of a one-stage sub-game. 



Remark 2.3. A related concept of competitive equilibrium in the industrial organization literature 
is that of a stage Stackelberg game [3J. In a Stackelberg game, at each stage one player is the leader 
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and has priority in making decisions; the second player then follows. This description corresponds 
closely to the preferential mechanism of equilibrium selection which always favors the leader. 

Economically, the third party implementing the correlation could be a government regulator, 
market watchdog, or just a proxy for market frictions that make one equilibrium most preferable. 
Thus, no inherent collusion is required and the game is still non-cooperative. If a regulator is 
involved, a socially beneficial correlation can be selected. For instance, a "utilitarian" communica- 
tion device maximizes the (weighted) sum of the firms continuation values so that the producers 
as a whole have best economic health. Alternatively, a "green" device minimizes CO2 emissions. 
Finally, a "preferential" communication mechanism can endogenously emerge without a third party 
due to extra advantages available to a given player (e.g. due to preferential regulatory treatment 
or other externalities). 

Remark 2.4. A variety of correlated decision- making is possible in sequential games [25]. Here 
we focus on the stepwise weak communication device whereby the players and the regulator com- 
municate before each stage; such a formulation allows the most flexibility and fits our economic 
description. However, in practice much weaker correlation could suffice. For instance, players can 
agree at date to use the preferential-i correlation law which means that in any "tie-break" case, 
player i "wins". Once this rule is fixed, no further communication would be necessary. Similarly, 
if 7 is such that the implied strategy TT-i(t, 7)|/-£i(i,7) of the other player is always pure, then a 
public randomization is sufficient at each step and no private signals are needed. Any mixture of 
NEP's is also a CEP and therefore except for the strictly-competitive games, one may always find 
correlation devices that correspond to pure Nash equilibria, obviating the need for randomization 
(either by players or regulator). 



3. Sequential Stopping Game 

Our analysis of the switching game will consist of building up the solution in several steps. We 
begin with analyzing the single-agent objective. Next, in Section [3.2| we move on to the one-shot 
non-zero-sum stopping game that is built iteratively from the one-period 2x2 games, following the 
methods of [35] • Finally, in Section 3.3 we describe the sequential stopping game that in the limit 
is shown in Section [3] to coincide with our original model in ([3]). 



3.1. Single Producer Problem. Before tackling the stochastic duopoly game, let us briefly re- 
view the solution of the single-player model. Since the control Ui{t) takes on a finite number of 
values, we have an optimal switching model that can be viewed as a sequence of optimal stopping 
problems. Such models (including price impact) were studied in [51 |2"B], 

Let us consider the optimization for producer 1. For the remainder of this section we fix a 
production schedule U2 of the second producer, as well as a communication device 7 that sends 
private signals fii(t,j) to player 1. In the single- producer problem, the objective is to maximize 
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the expected profit 
(11) 



sup E^ U ' U2 ^ 

(u(t))S«i( 7 ) 



T-l 



{{aiPt ~ hX t - ci)u{t) - K {1Mt _ )Mt)} ) 



Consider initial conditions P s = p,X s = x, U2(s) = C2 and let V(s,p, x, C2) be the value function 



corresponding to (11) conditional on starting in the "on" -production regime, and W(s,p, x, £2) the 
value function starting offline. Furthermore, using same initial conditions define recursively 

(12) 



y°(s,p,x,c 2 ) = e( 1 ' U2 >^ 



T-l 



^(axPt-hXt-d) 



t=s 



as well as W°(s,p, x, (2) = 0; 



T-l 



J2( a i p t - hXt - ci) + (W n - 1 (T,P T ,X T ,u 2 (T)) - K {im ) 



t=s 



V n {s,p,x,( 2 ) = sup E^ 2 '^ 

tS2 1 (s) 

W n (s,p,x,(2)= sup E^ u ^[V n - l (T,P T ,X T ,u 2 (r))-K {lfiA} ], n>\ 

tS2 1 (s) 

where under p(*> U2 ,7) the drift of the carbon allowance price is f(i,U2{t)). 
Proposition 3.1. Let lA™ = {u € U\ : u has at most n switches}. Then, 

"T-l 

^ {(aiP t - biX t - ci)u(t) - i^{i )tl ( t _) )U ( t )}} 



V n (s,p,x,(2) = sup E^a) 
(u(t))e«? ,u(s-)=i 

and as n — >• 00, T/ n (s,p, x, £2) - ^ V"(s,j?, x, £2), W n (s,p, x, C2) — ^ C2) uniformly on com- 

pacts. 

Proof. This is an analogue of [8j Theorem 1]. Compared to our earlier work, the only new feature 
is that the payoffs to producer 1 are randomized. Indeed, from her perspective, the strategy of 
player 2 (implied through the private signal /ii(i,7)) may be mixed. Consequently, her continuation 
value is unknown at decision time, depending as it is on the action of player 2. Formally, allowing 
for a relaxed switching control p\ at date s (representing probability of being on) the dynamic 



programming principle implies that in (12) 

V n (s,p,x,(2)=^ (s) sup \pl(a 1 p-bix-c 1 )-(l-p 1 s )K {im 
rfe[o,i] L 

+ E^,7) [ P y s V n (s + 1, P 8+1 , X^\\ 1) + pl(l - p 2 s )V n {s + 1, P s+1 , X^, 0) 

+ (1 - pl)p 2 s W n - l {s + 1, P s+1 , X$#, 1) + (1 - p\){l - p 2 s )W n ~\s + 1, P S+1 ,X£?,0)]} 

The outer expectation is averaging over the signal /ii whose law is specified by the communication 
device 7; however the decision- maker has access to fii(t,^) and therefore makes the switching 
decision pi based on the conditional strategy (p^)| / ui(s,7) of player 2. The inner optimization is 
linear in p\ and therefore the optimizer must be an endpoint of [0, 1]. Thus, as expected, given the 
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signal we can work with pure controls, u±(t) G JjV <r(/xi(i, 7)). Note that from the perspective of 
an observer who has access only to J-, the strategy of both players appears randomized. 

The rest of the proof proceeds exactly as in [8] by iterating over the control decisions of producer 
1 using the strong Markov property of (P, X) and the Snell envelope characterization of optimal 
stopping problems. □ 



Proposition 3.1 shows that the solution to (11) can be represented in terms of the sequence 



(V n , W n ) which correspond to optimal stopping problems defined in (12). Taking the limit n — > 00 
we obtain 



Corollary 3.1. (V, W) satisfy the coupled dynamic programming equation: 

't-1 

5>iP t - hX t - c x ) + (W(t, P t ,X t , u 2 {t)) - #{1,1,0}) 



[t-1 

V(s,p,x,(2)= sup E (1 ' M2 ' 7) 
Te3) 1 («) 

W(s,p,x,C2)= sup E^ u ^\v(r,P T ,X T ,u 2 (T))-K {im 
Moreover, an optimal strategy u\ G IA\ exists. 

3.2. Correlated Equilibria in Non-Zero-Sum Stopping Games. In this section we recall 
existing results on two-player non-zero sum stopping games in discrete time and finite horizon. Let 
Z = (zj k (t)), i G {l,2},j,k G {0,1} be a octuple of bounded (J 7 t)-adapted stochastic processes. 
Player i G {1,2} optimizes the reward 

(13) T 1; T 2 ) 4 ( +^)l{r^^ 

by choosing the (randomized) (J-t)-stopping time r« < T. In words, Zf° is the ongoing reward for 
staying in the game, Z\ is the reward if the player stops first; Zf 1 is the reward if the other player 
stops first and Zj l is the reward if both players stop simultaneously. Thus, continuing is associated 
with action '0' and stopping with action '1'. 

The Dynkin zero-sum stopping game corresponds to Z\° = —Z® 1 , Z® 1 = —Z^ , Z^ 1 = —Z), 1 and 
was recently fully analyzed by |16j . Also, the monotone cases Z® 1 < Zf 1 < Zj° P-a.s. (where both 
players prefer to stop late) and Zf 1 > Z\ > Zf° were considered by Ohtsubo [32J. In these special 
cases, a unique pure Markov NEP exists. The fundamental result of \32\ [33] characterizes game 
value functions (Vi, V2) for Z as a pair of J-"-adapted processes satisfying E[sup 0<t<T Vi(t)] < 00, 
Vi(T) = Z\ X (T) and for all < t < T 



(14) (V 1 (t),V 2 (t)) e£ 



(E^it + l)\T t ] + Z™(t),M[V 2 (t + l)\T t ] + Zf{t)) (Z?(t), Z 2 01 (t))\ 
(Z\*(t),Z?(t)) {zl\t\zl\t))j 



where £(H) is the set of game values corresponding to NEPs of H. This reduces computation of 
game values to iterative solution of one-shot 2x2 games, in complete analogy to standard dynamic 



programming. We seek a similar result for the switching game, see (23) below 



STOCHASTIC SWITCHING GAMES 



11 



Without any assumptions on the structure of Z appearing in (13), the existence of a pure NEP 
is not guaranteed. However, as shown by [TU] (see also [37] and references therein) a two-person 
stopping game always admits a mixed NEP. Again, there is no uniqueness and we might need 
equilibrium selection. Let 7 be an (J-t)-adapted stochastic process taking values in A4. Define the 
dependent randomized stopping rules (cf. ([8])) 

J 71(7) = tof{* = V'(t) < 7io(t) + 7n(*)}, 
1 r 2 ( 7 ) 4 M{t : j/(t) < 701 (t) + 7ii(t)}, 



r/(t) ~ *7ra/[0, 1] i.i.d.. 



Thus, conditional on the game still continuing, the stage-t payoff to player % is J2j kljk{t)zf k (t) 
and total expected payoff is 



(15) 



E" ; 



Ji(s,Ti(7),r 2 (7)) 



E 



T-l 



t-1 



EE|(IlTooW)7 3 ,(^f(t) 

i=s j,k I r=s 



As before, correlation is implemented through private signals /Uj(i, 7) and a correlated equilibrium 
of Z is a communication device 7 inducing a stopping strategy profile 7(7) = (71(7), t 2 (7)) E 
D 1 x £) 2 such that for i = 1, 2 and all < t < T 

(16) Vi(t;j,Z) ±Et[Ji(t,T{>y))\Tt] > ET[J i (t,f i ,r_ i ( 7 ))|Ji], W; e ©'(t). 

Observe that given a device 7 leading to a CEP, it must be that 



(17) Vi(t;j,Z)= sup E^ 

re2)»(f) 



/ (rAr_ I ( 7 ))-l 




+ Z, (T_i)l{ r _ i<T} + Z f (r)l{ T=T _.}|Ji 

which is a standard optimal stopping problem for player i in the enlarged filtration Q % . 

Lemma 3.2. |35[ Theorem 2.3] Consider a CEP with communication device 7 of a stopping game 
Z. Then for all t G {0, 1, . . . , T — 1} we have 

(18) 

f 7bo(t)(E[^i(t + 1)1^] + 2?°(t)) + 7oi(^i 01 (t) > 7oo(t)^i 10 (i) + 701 (i)^)); 
7 oo(t)(E[y 2 (t + 1)| Ji] + Zf{t)) + 7io(i)^2 10 (i) > loo{t)Zf{t) + 7l0 (t)Z 2 n (t); 

7 io(t)^ 10 (t) + lll{t)Z{\t) > 7io(t)(E[Pi(t + 1)| -Ft] + ^ 00 (t)) + 7ii(*)^ 01 (*); 
701 (^ 2 » + Hi(t)pZ^(t) > 7oi(t)(IE[^ 2 (t + 1)| Ft] + Z™(t)) + 7ll (t)Z 2 10 (t). 

Lemma 1 shows that a CEP of the stopping game is rational at each stage. For instance, 
the first inequality in (18) means that conditional on player 1 signal being 'continue', the ex- 
pected payoff to player 1 from continuing (the right-hand-side) is better than the expected payoff 

from stopping. In either scenario, player 2 implements the conditional strategy vf 2 (t, 7)|^ 1 (t, 7 )=o = 
1 700ft) 701ft) \ 

Woo ft) +701 ft) ' 7ooft)+7oift)<'' 

As shown by [35, Theorem 2.4], any finite- horizon stopping game with bounded payoffs admits 
a CEP; in fact outside the zero-sum and monotone cases we expect that a large number of CEPs 
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are possible. It is convenient to think of communication device 7 leading to a CEP in (16) as a 



measurable selector of local correlated equilibrium points in the one-shot 2x2 games. Thus, let 
r : T x Q x M 2x2x2 — > A4 be a measurable map such that for any 2x2 game H, T(t,oo,H) is a 
CEP of H. Then using T, one may construct a communication device 7 by inductively using the 



CEP T(t, u), H(t, a;)), where H(t,u)) is the right-hand-side in (14), and proceeding back in time. 



Observe that for most if's, T(-,H) is simply the unique NEP available, so that the selection feature 
is "silent", and the device is only really activated when considering the coordination game. With 
this perspective in mind, we call a correlation law T a communication device which is based on 
the same local criterion (for instance "minimize today's emissions" or "maximize today's value of 
player 1"). 

3.3. Recursive Construction. We return to the emissions market duopoly setup. The emission 
schedules of the two agents are interpreted as a sequence of regime-changes. Thus, the single- 
stopping game in the previous section is viewed as the sub-game for making the next regime-switch. 



The stopping game in Section 3.2 is accordingly denoted as a (1, l)-fold switching game and we 
now will consider (n, m)-fold switching games with game value functions V n,rn . These games have 
a restricted set of possible production strategies; namely the total number of regime switches over 
the game horizon is bounded by n and m, respectively. Using the Markov property of the game 
state and actions it is not surprising that these various switching games are related to each other. 

we identify the running profit with Zf°(t) 



In terms of the notation of Section 

73k 



3.2 



{aiPt 



biXt — Ci)ui(t) and the other Zf 's with various game continuation- values. For the remainder of the 
section, we make a standing assumption that a communication device 7 is chosen and fixed. Let 
us fix an initial state P s = p, X s = x and initial production regime Q = ((1^2)- Define a double 
cascade of stopping games indexed by n and m via 

(19) V?' m {s,p,x,£) = ^(s; 7 ,^' m (C)), n,m>l 

which uses the notation of ( |16[ ) based on the recursive payoff structure 
' (Z n ' m )°°(t, C) = (aiP t - biX t - CiXi! 

i{i=2}-ft{2,c 2 ,i-c2}; 
.2) - j-{i=i}^{i,ci,i-ci}; 

The boundary cases are first 



(20) 



(Z n > m )i\t, C) = V^ m - l (t,P t ,X t , Ci, 1 - Ca) - 1 > . n /v . 



(z n > m )l°(t,() = vr hm (t,P t ,x t ,i- Ci,C 2 



{z n n\ l (t, c) = vr hm ~ l (t, p t , x t , 1 - ci, 1 - c 2 ) - Ku , 



T-l 



^(aiPt - biX t - a)Ci 



t=s 



next, V™'°(s,p, x, () and V2' m (s,p, x, Q are identified with the single-player optimization problems 



as in (12) (keeping the emission regime of the other player fixed at C-i)- Finally, we take 



V 2 n '°(s,p,xX) =E ( - U "'*'^ 



T-l 

^(a 2 Pt 



t=s 



b 2 X t - C2X2 
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where u±'* is an optimal control for the problem defining V™' , and similarly for Vi' m (s,p, x, Q. 

3.4. Switching Game Equilibrium as Sequential Stopping Game Equilibrium. We now 

proceed to glue the sequential stopping games of y^ m and re-interpret the latter as value functions 
of a switching game. For n > denote by lA™ C lAi the set of all production strategies for player i 
with at most n switches. Consider the restricted repeated game with payoffs ^ where we require 
u\ G and U2 G U™, so that the first producer may change her production regime at most n 
times, and the second producer at most m times. 

Our first task is to obtain a switching-game CEP that matches the definition of V n ' m . To do 
so we pick a correlation law T; T gives rise to a CEP of any stopping game, in particular it leads 



to well-defined game values Vj 1 ' 171 in (19). We now construct a communication device j n > m for the 
(n, m)-switching game. Let ki(t) be the number of production switches used by player i by stage t. 
The device 7 n > m (i) at stage t is taken to be T (t, u, Z n - kl ^' m ' k2 ^ (u(t))) defined in terms of pO) 



and the latest regime u(t). Note that the overall ^ n ' m is no longer Markovian since it has memory 
of the number of switches made by each player, which is necessary in the constrained game. The 
above construction is well-defined for all paths of (P,X,u), even outside equilibrium. 

Using 7 n ' m we proceed to construct switching controls u"' m for the (ra, m)-switching game. To 
simplify notation we write r n,m = Ti( / y n ' m ) A T2(j n ' m ) which is interpreted as the equilibrium 



first stopping time for the game defined by (19) under the correlation law T. Given the starting 



production regime £ = (£i, £2), let us define the switching controls u™' m (s) for this game by 
n"' m ( s )=Ci for s<r n ' m ; 



(21) 



... and so on, 



f 1 a v n,m ^ , n—l,m u n,m . n,m 

1 — Ci i° r t ' < s < T ' when t x < r 2 ; 

Ci for r"' m < s < f'™- 1 when r 2 "' m < r™'" 1 ; 

1 - Ci for r"' m < s < r™- 1 '™- 1 when T?< m = r 2 n ' m , 



and similarly for u^ ,m {t). In words, it"' m keeps track of the production regime of the i-ih agent 
following the decision rules defined sequentially by descending through the family of the V n ' m -stop- 



ping subgames (one stopping game at a time). Then by definition of (19) we have u"' m G U.™ and 



1*2 ' m G U™. It can also be seen through an easy induction argument that 
(22) VP' m (s,p, x, C) = Vi(s,p, x, C; n"' m ), 



so that the switching control u n,m of (21) allows to achieve the game values V n,m defined recur- 



sively in (19). Moreover, the next theorem shows that the pair (tt 1 ' , u 2 ' ) is in fact a correlated 



equilibrium (using correlation device j n ' m ) for the game Q over the control set IA™ x IA™ '■ 

Theorem 3.3. For all n > and ui G W" we have V\_(t, ■; u"' m , uV,' m ) > V±(t, •; u%, u^ m ). Similarly 
for all m > and U2 G IA™ we have V2(t, •; u"' m , u£ m ) > Vz(t, •; u[ 



n,m 

,U2 
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Proof. The idea of the proof is to make use of the Markov structure of our problem and apply 
induction. The other key tool is that given ^ n > m , we can look at one player at a time which 
essentially reduces to a single-player problem studied before, see (17). 

Due to symmetry, it suffices to prove the result for player 1. When m = the other player 



cannot act, the game becomes trivial and Theorem 3.3 is just a re-statement of Proposition |3.1| 
Conversely, when n = 0, the first player cannot act and there is nothing to prove. Using induction 
we assume that the theorem has been shown for the pairs in— 1, m — 1), (n,m — 1) and (re — 1, m); 
let us show it for the case (n, rre). Given an arbitrary u\ € Uf, write it as u\ = (r , u\) where 
u\ £ W™ -1 denotes the remainder of u\ after the first switch time r 1 . Let 

first switch for the second player dictated through j n > m . Define r = r 1 A r 2 '*. Also for notational 
convenience we omit all the arguments of V n,m except for the time variable. Then the strong 
Markov property of (P, X) and the way ii^' m {t) was constructed show that 



/ n.m 7i m \ 

E («i,-u 2 ' a ' ) 



'T-l 



^(aiP, - biX s - ci)ui(s) 



i n.m n m\ 

£(«1,«2' ,7 ' ) 



Vi(T;U1,U2 JMt^t 2 '*} 



i t/ I - n,m— I;-. . t /" / * n— l,m— l\i 

+ Vi(t;,ui,U2 )l{ri>r2,*} + Vi(t;ui,u 2 )l {r i =r 2,* } 



Conditioning on r 1 and r 2 '* we therefore have Vi(t\u\,u^ 



'T-l 



2 

T-l 



E M' m n"^^^ {aiPs _ blXs - Cl ) Ul ( t )j + ^(aiP s -6iX s -ci)^( S )J l {r i <T 2,* } 

T-l /T-l \ 

(aiP s - 6iX s - ci)-ui(s)^l{ r i >r 2,,} + I ^ (aiP s - 6iX s - ci)ni(s) J l{ T i =T 2,. } 

,_ T 2,* \ s=r l / 



+ Vi(t; ui, n2' m_1 )l{ r i >T 2,.} + Vi(r; ui 
by induction hypothesis we have the inequality 

'T-l 



n— l,m— 1\ 



,7J 2 



+ Vi[t ' ;u 1 



,s=t 

2,*_ n,m—l n,m—l 



,U 2 



)l {r i >r 2,, } + Vi(r ;u" 



1 n— l,m— 1 71— l,m— 1 



,«2 



L| r i =r 2,*} 



/T-l 



< sup E^W'<' m '-v n,m )[ V( ffil P s -i,A- Cl K(i) +y 1 (r 1 ;« n - 1 ' m )l {T i <r2 



*} 



+ Fi(T ^.^,m-l )l{ri> ^ } + n(r l ; 



1. ^ffl— 1,771— !•■ 



{ T 1=T 2 >*} 



T r ( , -m,m\ 



where the last line uses the relationship ( 22 ) , the construction of the stopping game defining V 
in (20), and property (17). 



□ 
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The above construction leads to the key result of this section that characterizes CEP of switch- 
ing games, establishes their existence, and gives a recursive formula for the resulting game value 
functions. For a 2 x 2 game H defined in ^ and correlation device 7 we denote the respective 
game values as 

Theorem 3.4. Fix a correlation law T. Then T gives rise to a CEP of the switching game Q. 
Moreover, the corresponding value functions Vi(t, Pt, Xt, C; T) solve 

(23) 

vi(t, p t , x t , d = 7oo(t)yi(Ci, C2) + 7Di(t)ft(Ci, 1 - C2) 

+ 7io(t)0i(l - Cl, C2) - ^l,Ci,i-Cx) + 7il(t)(n(l - Ci, 1 - C2) - ^l,Ci,i-Ci) 
V 2 (t, P t , X t , C) = 7oo(t)l2(Ci, C2) + 70i(t)(^2(Cl, 1 - C2) - ^2,C 2 ,l-C 2 ) 

+ lio(t)Y 2 (l - C1X2) + 711 (*) (Y 2 (l - Ci, 1 - C2) - ^2,c 2 ,i-0.) 

where Y{(t, £) = E** [Vi(i + 1, C)l-^i] + (oiPt — hXf — Ci)Q and we have omitted the dependence on t. 
The equilibrium controls can be taken as u* = uJ ,T , as defined in (21). 



Proof. We wish to take n, m — > 00 in Theorem 3.3 Because for n > m, lA m C U n , it follows that for 



a fixed m, V± ' is increasing in n (and for a fixed n,V 2 ' is increasing in m). For our discrete-time 
game, at most T regime switches are possible for each player. Therefore u* G U.f and it follows 
that V™' n = V{ for all n > T. In particular, a switching CEP based on V results by using r y T ' T '. 

Moreover, at equilibrium at most one switch is made at any given stage due to the triangle 
condition on Kijk. Therefore, if it is optimal to switch at stage t from £ to u, then already 
starting at regime u at t and same state variables it is optimal to make no changes, so that 
Vi(t, u) = K u [Vi(t + 1, w)|.Ft] + (aiPt — biXt — Ci)Ui for that scenario. Combining these facts with the 



form of ( 14 ) and dropping the constraints on the number of switches, we may express all payoffs in 



terms of next-stage game values. The recursion (23) is now obtained by making this substitution 



in ([14). □ 

4. Numerical Implementation 



Theorem 3.4 shows that a game value and equilibrium strategy profile can be obtained recursively 



by solving the 1-period 2-by-2 games in (23). The payoffs of those games are given in terms of 
conditional expectations of next-stage game values. Therefore, a numerical implementation hinges 
on accurate evaluation of these expectations. Since our state-space in (P, X) is continuous, it 
is impossible to make this computation exactly. Below we present two possible approximation 
approaches. 

4.1. Markov Chain Approximation Algorithm. Our model would be simplified if the contin- 
uous state space of (P, X) is discretized. Let (P, X) be an approximating discrete-state process 
with (Pt,Xt) living on a finite subset D t C R+. If the pair (P, X) is furthermore chosen to be again 
Markov, this is known as the Markov Chain Approximation (MCA) method of [25J. With such 
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(P,X), a conditional expectation E[/(P m , X t+1 )\P t = p,X t = x]~ E[f(P t+1 , X t+1 )\P t = p,X t = 
x] for any measurable function / is just a weighted sum based on the transition probability matrix 
of (P,X). The backward recursion in (23) for Vi, the corresponding approximation of V{, can now 
be implemented directly for each stage t and each possible state of (Pt,X t ) £ D t . A well-known 
procedure constructs (P, X) by taking D t to be a 2-dimensional regular grid or lattice and allowing 
state transitions only between neighboring grid points. Moreover, the transition probabilities of 
(P, X) are chosen so as to have local consistency in the first two moments with the 1-step transition 
densities of (P, X); see [251 Chapter 5]. 

To use this approach in our model, one must take into account the price impact. Therefore, we 
construct four approximations (P, X^) indexed by the possible joint production regimes G {0, l} 2 
that induce different local dynamics of X^, see ([2]). In other words, our effective state variables 
are (P,X,Q. For every possible combination (t,p,x,C) £ T x Dt x {0, l} 2 the relation (23) is 
then solved through backward recursion. A generic convergence proof (as the grid spacing tends to 
zero) of this procedure for finite-horizon non-zero-sum stochastic games was obtained in |24| . Note 
that in our model the controls u(t) are discrete and finite- valued and therefore all the compactness 
conditions in |24| for the control space are automatically satisfied. 



4.2. Least Squares Monte Carlo Approach. Like classical dynamic programming, the MCA 
method above suffers from the curse of dimensionality. Indeed, the size of the approximating grid 
grows exponentially in the dimension of the state variables. In our basic model (P, X) are two- 
dimensional; however realistic implementations are likely to take multi-dimensional factor models 
for P and (possibly) X. Thus, it is helpful to seek a more robust algorithm. 

A seminal idea due to [9j[l5l[27] is to use a cross-sectional regression combined with a Monte Carlo 
simulation to compute the relevant conditional expectations. The key step is a global approximation 
of the maps (t,p,xX) ^ Vi(t,p, x, £) and equilibrium one-step strategies (t,p,x,Q h-> u(t,p,x,() 
(based on a fixed correlation law V) via a random sample of (p, Xt). The construction is iterative 
and backward in time. 

Suppose that the current date is t and we already know all the approximations Vi(s,p,x,() ~ 
Vi(s,p, cc, C) f° r s > t and the corresponding equilibrium strategy profiles. Given a collection of 
initial points (p™, x"), for n = 1, . . . ,N, and an arbitrary starting emission regime £ = tT^t) we first 
simulate the future cashflows on [t + 1,T] for each scenario n. This is done by iteratively updating 
(p^+i'^s+i) through an independent draw from the conditional law P" n M and then computing 
the equilibrium actions uf(s) of each player for s = t + 1,.. .,T based on the estimated future 
game values Vi(s,p™,x™,u n (s)) and the chosen communication device T. If T leads to randomized 
strategies, such a randomization is naturally implemented as part of this simulation. The realized 
pathwise cashflow $™(t + 1, (f) represents an empirical draw from Vi(t + 1, Pt+i, X^ +1 , (f) conditional 
on Pj = p™,Xt = X™. We now perform a cross-sectional regression of ($f(£ + l,C))n=i against 
(Pti x t)n=i by using a collection of basis functions Bi(t,p, x), £ = 1, • • • , r. The regression yields 
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the predicted continuation values 

Vi(t,p?,x?, C) - ^ [Vi(t + 1, Pt+i, X t+1 , C)| P t = Vt, X t = x 

Finally, using Vi together with the current payoffs and switching costs and the correlation law T 
we solve for the equilibrium game values Vi(t,p™,x2,u) for each production regime u by applying 



the stage-i sub-game of Theorem 3.4 The computed game equilibrium also provides the map 
(t,p™,x™,u) i— > u*(t) for the equilibrium strategies. The regression results allow to further extend 
this to arbitrary initial condition (t,p,x,u). Working back to t = 0, the final answer (which is a 
random variable depending on the Monte Carlo sample) is simply the average Vi(0,po, xo, Co) — 

The initial collection (Pt, x t)J =l 1S obtained by simulation. Since, X is affected by the price 
impact of u, to simulate we need to select some anterior auxiliary strategy profile 

While in theory if can be arbitrary, in practice it should be close to the equilibrium u*. Indeed, 
the collection (vi(t,p™, x™, C))n=i is supposed to approximate Vi(t, P t , X* , u(t)) where X t * is the 
equilibrium CO2 allowance price. Because v^s are computed by employing regression, the resulting 
approximation cannot be uniformly good on From the point of view of accurate solutions, it 
needs to be good around the region of interest for X* . Thus, we need most of the x™'s to be in 
that (o priori unknown) neighborhood. To overcome this difficulty, as the algorithm works back 
through time, the future paths (p™,x™), s > t are re-computed using the now-available (approx- 
imately) equilibrium strategies u*{s). To further mitigate the problem, we iteratively re-do the 
whole simulation and subsequent backward recursion a few times (in practice three iterations suf- 
fice), using the computed v* from one iteration as the anterior in the next one. The Appendix 
summarizes the above scheme in pseudo-code. 

Selection of basis functions should reflect the expected shape of (p, x) h-> Vi(t,p, x,Q. A typical 
choice is to use low-degree polynomial basis functions, such as p,p 2 ,x, x 2 , etc. In practice, r = 5 — 7 
basis functions and ./V = 32000 — 50000 paths suffice. A large degree of customization, such as time- 
varying bases, constrained least-squares regression, variance reduction methods, etc., is possible 
to speed up the computations. The Appendix summarizes the above scheme in pseudo-code in 
Algorithm [2j It calls as a sub-routine Algorithm [T] that carries out the forward simulations of 1?". 
The cost of simulations in Algorithm [2] is 0{N • T 2 ) which consists of re-simulating N paths on 
[t, T] as t goes from T — 1 to zero (see Algorithm [I]) . The cost of doing regression against r basis 
functions on each path and for each stage is 0(N ■ T • r 3 ) and the cost of computing continuation 
values is 0(N-T 2 -r). The memory requirements from storing all the simulation paths are 0(N -T). 



4.3. Numerical Examples. In this section we illustrate our analysis with a numerical case-study. 
The selected model parameters are listed in Table [T] The example represents emission scheduling 
of two producers over one calendar year; all the parameters of (P, X) are in annualized units and 
we use bi-weekly periods T" = 26 to model the scheduling flexibility. Note that the electricity price 
Pt is more volatile than the CO2 allowance price Xt; also the mean-reversion parameter kx of X 
is quite large, implying a significant price impact. In ^ we take /(Ci ; C2) = log(12 + 8(j + 4^), 
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0.4 


T 


1 


p 
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Ki 0.2 


K 2 0.2 



Table 1. Model Parameters for the Examples in Section 4.3 



Correlation Law 


Fi(o,p ,x ) y 2 (o,p ,^o) 


Utilitarian 
Egalitarian 
Preferential 1 
Preferential 2 


5.30 4.14 
5.33 4.20 
5.39 4.11 
5.02 4.24 



Table 2. Comparison of equilibrium game values for different correlation laws T. 
Standard errors of the Monte Carlo scheme are about 1%. Parameters are as given 
in Table [H 



so that the mean- reversion level of log X is linear in the production regimes of producers 1 and 2, 
with producer 1 having more influence due to emitting twice as much carbon, b\ = 2b 2 => Q\ = 2g 2 . 
The stylized production/emission parameters represent a dirty "coal" producer 1 who has low 
input costs but needs lots of allowances, and a clean "natural gas" producer 2 who has high 
fixed costs but small sensitivity to allowance prices (and can generate twice as much electricity). 
Observe that if both producers emit simultaneously for a long period of time, then we expect 
Pt ~ P = 45, Xt ~ /(l, 1) = 24 meaning that everyone will be losing money. Therefore, extended 
joint emissions are not sustainable. 

A large variety of CEPs are possible in our model; Table[2]shows the game values corresponding to 
four representative correlation laws. These values were obtained by running Algorithm [2] discussed 



in Section 4.2 using iV = 40000 paths, and the basis functions {l,p, x, x 2 , (2p — x — 80) + , (p — 
2x — 10)+}. We find that different correlation laws modify the expected profit of the producers by 
3% — 5%. As expected, individual producer values are maximized by the preferential equilibria that 
always favor the respective player. Counter-intuitively, the egalitarian CEP (which maximizes at 
each stage the minimum continuation value) produces larger game values to both producers than 
the utilitarian CEP (which maximizes the sum of continuation values). This occurs because the 
correlation law is applied stage-wise and optimizes a local criterion; there is no guarantee that the 
corresponding global criterion is respected. A similar phenomenon was observed in |35| Section 
5.4]. 

To illustrate the equilibrium strategy profiles, Figure [T] shows the empirical regions in the (P, X)- 
space corresponding to different equilibrium strategies at a fixed date t = 7 (i.e. about three 
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months into the year) using the Preferential- 1 correlation law that always favors producer 1. As 
expected, when the current P&L of both producers is strongly negative (upper- left corner), the 
equilibrium action is u*(t) = (0,0); when it is strongly positive (large Pt) the equilibrium is to 
generate electricity u*(t) = (1,1). Because of the differing carbon-efficiencies of the producers, 
there are also large regions where exactly one producer can generate profit (e.g. around {Pt 6 
[40, 45] , X* G [10,12]} only producer 2 is profitable). However at the border regions, the price 
impact and competition create new effects. In Figure [TJ we observe the emergence of a local 
anti-coordination game around {(Pt,X*) = (50, 15)}, and a competitive game around {(Pt,X*) = 
(50, 12)}. We cannot analytically verify whether a particular type of game may emerge locally; thus 
the competitive game region in Figure [T] could be either a true phenomenon or an aberration due 
to numerical errors (e.g. poor regression fit in that region). Note that most simulated equilibrium 
paths for XI stay above x = 13, so the competitive game scenario at t = 7 is very unlikely to be 
realized (i.e. very few paths hit that region). 




30 35 40 45 50 55 B0 65 70 

Electricity Price P ( 



Figure 1. Equilibrium game strategy u*(i) as a function of (Pt,X^) for t = 7 and 
£ = (0,0). The green region denotes the anti-coordination game-type where the 
Preferential- 1 correlation law is used, and the red region denotes the competitive 
game-type where the unique mixed NEP is chosen. Elsewhere, we label the regions 
according to the unique pure NEP implemented. 

To better illustrate the optimal strategy over time, Figure [2] shows a sample path of the equi- 
librium price (Xt) for one uj. Analogously to single-player problems, the CO2 allowance price 
undergoes hysteresis cycles |13j . Thus, when (X*) is low, production becomes profitable. This 
leads to increased emissions and XI tends to rise through the price impact mechanism. In turn, 
the ensuing higher emission costs eventually curtail production and XI falls back. The presence of 
switching costs Ki lowers the scheduling flexibility of the producers and further amplifies this cycle 
through inertia. 
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FIGURE 2. Sample equilibrium path of the emissions game. Top left panel: cu- 
mulative realized P&L of the players as a function of t. Bottom left panel: the 
electricity-carbon spread of each producer for the current time step. Right panel: 
evolution of the controlled equilibrium allowance price X*, as well as the imple- 
mented strategy u*(i) G {00,01,10,11} = {1,...,4}. The panels were generated 
using Algorithm [T] given in the Appendix. 

5. Conclusion 

In this paper we studied a new type of stochastic games which were motivated by dynamic emis- 
sion schedules of energy producers under cap-and-trade schemes. Because multiple game equilibria 
can emerge, we explored various correlated equilibria. It is an interesting economic policy question 
which equilibrium is likely /desirable to be implemented and how the regulator can steer market 
participants towards that choice. For example, putting a price on emissions is supposed to partially 
drive out "dirty" producers. It would be an intriguing exercise to study how much these effects 
depend on equilibrium selection and whether blockading of inefficient polluters is possible under 
some equilibria. 

In our simplified model, the producers only made binary emission decisions at each stage. On 
a practical level, much finer granularity is available. It would be straightforward to extend our 
problem and allow a more general finite-state control set of size |^4|. The only modification would 
be to replace the 2x2 bimatrix games with a more general A x A bimatrix. The theory for more 
than two producers is incomplete and it is an open problem to establish existence of CEP /NEP for 
multi-player stopping games (see [38J for current state-of-the-art). 

5.1. Further Extensions. Several aspects of our model merit further analysis. The dynamics for 
CO2 allowance prices in ((2]) were selected to capture succinctly the price impact of each producer, 
leaving out other important features. As described in the introduction, as the permit expiration date 
T approaches, the CO2 price should converge either to zero (if excess permits remain) or to a fixed 
upper bound x (the penalty for emitting without an allowance). New (time-dependent) stochastic 
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models are needed to mimic this property, see [18]. Also, some cap-and-trade proposals will 
allow trading of allowances by financial participants whence no-arbitrage restrictions might have to 
be imposed on the dynamics of X . All these possibilities can be handled straightforwardly, since 
the main construction is for arbitrary X-dynamics. Ideally, a fully endogenous model is desired 
for allowance prices; namely Xt should be a function of total expected emissions until T compared 
to total current supply, i.e. have a characterization in terms of conditional expectations of future 
equilibrium emission schedules. See [6l [7J [10] for such price-formation models and related general 
equilibrium frameworks. These extensions will be considered in forthcoming papers. 

Our formulation was in discrete-time; while this is sufficient for practical purposes, it is of great 
theoretical interest to construct a continuous-time model counterpart. The overall structure of a 
switching game as a sequence of stopping games straightforwardly carries over to continuous-time. 
However, description of correlated stopping equilibria in continuous time has not been attempted so 
far. In fact, the only reference dealing with randomized continuous-time stopping games is |40] (see 
also |26j for the latest results on general continuous timing games). Note that in continuous-time 
one must work with Nash e-equilibria since all stopping strategies are defined only in the almost- 
sure sense. Second, to ensure the representation of V% as iterative stopping games through V™ ,m , it 
is necessary to a priori show that each player makes finitely many regime switches. At this point 
we are not able to state any conditions to guarantee this, except requiring mandatory "cool-off" 
periods between each emission regime switch. 

In our Markovian setting, solutions of continuous-time single-player switching problems have 
representations in terms of reflected backward stochastic differential equations (BSDE) [20^ This 
representation should continue to hold in a game setting and will be explored in a separate paper. 
Related results have already been obtained for stochastic differential game analogues of our setup, 
whence u(t) has continuous state-space, see [21] [22], 
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Algorithm 1 Simulating one realized cashflow path i?i(s), < s < T 

input: Basis functions Bg(p, x), 1 = 1, . . . , r, regression coefficients &i(t, C); correlation law T 
input: Initial condition (pq, xq,u(0)); horizon T 
Initialize i?i(0) <— // Realized cashflows 
for t = 0, . . . , T - 1 do 
for each ( G {0, l} 2 do 

/ / Evaluate the predicted continuation values from taking action £ 
Set qi(t, C) i- Yd=i "f(*> C) B e(Pt, xt) - K {iMt) ^ } + (mpt - hx t - c^Q 
end for 

Compute the stage-i game values based on qi(t, ■), i = 1, 2 and T, see (23) 
Obtain the correlated equilibrium strategy u(t). 
if u(t) is mixed then 

Perform randomization to obtain the realized action pair u(t + 1) 
else 

Set u(t + 1) 4— u(i) / / ii(t) is pure 
end if 

Update i?;(t + 1) <- - if{i jUi (t) )Uj ( t+ i)} + (mpt - hxt - Ci)Ui{t + 1), i = 1, 2 
Make an independent draw (pt+i, xt+i) ~ F u ( t+1 \-\p t , xt) 
end for 
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Algorithm 2 Computing Correlated Equilibrium Game Values 



input: N > (number of paths); Bg(p, x), I = 1, . . . , r (r regression basis functions) 
input: Correlation law T 

Select anterior strategy profile 

for each regime £ G {0,1} 2 do 

Simulate N i.i.d. paths (pf, x^' n )^ =1 under P"° using Algorithm [l] and pg = po, £g'™ = £o 

end for 

Initialize $f(T, <f) <- 0, n = 1, . . . , N 
for t = (T-l),...,l,0do 
for each regime £ do 

Evaluate Be(pf, x\' n ) for £ = 1, . . . , r and re = 1, . . . , JV 

Regress 



di(t,C) <- argmin^ ^(t + 1,C) - £ a<B,(#, 4' n ; 

° eRr n=l' £=1 

end for 



for each current regime u do 

for each £ G {0, l} 2 , and each n = 1, . . . , N do 

// Compute the predicted continuation value for each player from taking action £ 
Set qf(t, u, C) <- Y? £= i af(t, C)B e {pf, x"' n ) - K {i>UuCi} + (atf# - biX^ n - 

end for 

for each path n = 1, . . . , N do 

Compute the stage-i game values based on q n (t, u, •) and T, see (23) 
Obtain the equilibrium policy u n '*(t,u) 

Recompute u) using u n '*(t,u) at stage-i and Algorithm [T] for future stages 
end for 
end for 
end for 

return F;(0, Po , x , () ~ £ £^ =1 0?(O, <f) 

return Regression coefficients cti(t,C) summarizing equilibrium strategies 



